Find keyword values from PDF [closed]
Posted
by
JukkaA
on Server Fault
See other posts from Server Fault
or by JukkaA
Published on 2012-09-27T13:42:49Z
Indexed on
2012/09/27
15:39 UTC
Read the original article
Hit count: 251
I have a lot of PDF reports I'd need to index. They're mostly "text-based PDFs", not images. I know they all have account number in certain format, 123456AAAAA and some other keyword info like addresses, customer names etc. needed in indexing these files. Basically if the file is ab.pdf, I need to create ab.txt that contains:
ACC=123456AAAA Customer=John Doe Date=20120808
What would be the best software/solution to generate indexing information for these?
I know there's pdftotext, but piping it to different grep/awk commands is a hack... It would be nice to specify an area in PDF to search for the account number, and specify the format it is in.
© Server Fault or respective owner